-
Notifications
You must be signed in to change notification settings - Fork 5.5k
feat: TVF Part 6.5/X Small TVF analyzer/planning changes #26505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Reviewer's GuideThis PR refactors the table function planning and analysis path by introducing properOutputs and copartitioningLists in TableFunctionNode with immutable collections, overhauls pass-through and required column specifications, updates parser/analyzer/planner to consume the new structures, applies minor planner and rewriter enhancements, and adjusts tests to align with renamed functions and updated behaviors. Class diagram for updated TableFunctionNode and related classesclassDiagram
class TableFunctionNode {
-String name
-Map<String, Argument> arguments
-List<VariableReferenceExpression> properOutputs
-List<PlanNode> sources
-List<TableArgumentProperties> tableArgumentProperties
-List<List<String>> copartitioningLists
-TableFunctionHandle handle
+List<VariableReferenceExpression> getOutputVariables()
+List<VariableReferenceExpression> getProperOutputs()
+List<List<String>> getCopartitioningLists()
}
class TableArgumentProperties {
-String argumentName
-boolean rowSemantics
-boolean pruneWhenEmpty
-PassThroughSpecification passThroughSpecification
-List<VariableReferenceExpression> requiredColumns
-Optional<DataOrganizationSpecification> specification
+String getArgumentName()
+PassThroughSpecification getPassThroughSpecification()
+List<VariableReferenceExpression> getRequiredColumns()
+Optional<DataOrganizationSpecification> getSpecification()
}
class PassThroughSpecification {
-boolean declaredAsPassThrough
-List<PassThroughColumn> columns
+boolean isDeclaredAsPassThrough()
+List<PassThroughColumn> getColumns()
}
class PassThroughColumn {
-VariableReferenceExpression variable
-boolean isPartitioningColumn
+VariableReferenceExpression getVariable()
+boolean isPartitioningColumn()
}
TableFunctionNode "1" *-- "*" TableArgumentProperties
TableArgumentProperties "1" *-- "1" PassThroughSpecification
PassThroughSpecification "1" *-- "*" PassThroughColumn
PassThroughColumn "1" *-- "1" VariableReferenceExpression
Class diagram for Field class after removal of newUnqualified methodclassDiagram
class Field {
-Optional<NodeLocation> nodeLocation
-Optional<QualifiedName> relationAlias
-Optional<String> name
-Type type
-boolean aliased
-Optional<QualifiedName> originTable
-Optional<String> originColumnName
-boolean hidden
+Optional<NodeLocation> getNodeLocation()
// ... other methods
}
Class diagram for QueryPlanner class after visibility changeclassDiagram
class QueryPlanner {
+QueryPlanner(Analysis analysis, VariableAllocator variableAllocator, ...) // now public
// ... other methods
}
Class diagram for SimplePlanRewriter context with new getNodeRewriter methodclassDiagram
class SimplePlanRewriter {
+C get()
+SimplePlanRewriter<C> getNodeRewriter()
// ... other methods
}
Class diagram for LocalQueryRunner with updated createCatalog methodclassDiagram
class LocalQueryRunner {
+void createCatalog(String catalogName, String connectorName, Map<String, String> properties)
// ... other methods
}
Class diagram for DescriptorField construction in AstBuilderclassDiagram
class DescriptorField {
+DescriptorField(NodeLocation location, Identifier identifier, Optional<Type> type)
}
Class diagram for StatementAnalyzer verifyRequiredColumns logic changeclassDiagram
class StatementAnalyzer {
-void verifyRequiredColumns(TableFunctionInvocation node, Map<String, List<Integer>> requiredColumns)
// ... logic now uses getVisibleFieldCount instead of getAllFieldCount
}
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
daf3bd6 to
21652b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- Add explicit requireNonNull checks for copartitioningLists and its nested lists in the TableFunctionNode constructor to guard against null inputs and avoid potential NPEs.
- Consider renaming PassThroughColumn.getOutputVariables to a singular form like getOutputVariable to better reflect that it returns a single variable and reduce naming confusion.
- Review the decision to make QueryPlanner.coerce, toSymbolReferences, and toSymbolReference public—if they are only needed for testing or internal use, consider annotating them with @VisibleForTesting or keeping them non-public.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Add explicit requireNonNull checks for copartitioningLists and its nested lists in the TableFunctionNode constructor to guard against null inputs and avoid potential NPEs.
- Consider renaming PassThroughColumn.getOutputVariables to a singular form like getOutputVariable to better reflect that it returns a single variable and reduce naming confusion.
- Review the decision to make QueryPlanner.coerce, toSymbolReferences, and toSymbolReference public—if they are only needed for testing or internal use, consider annotating them with @VisibleForTesting or keeping them non-public.
## Individual Comments
### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TableFunctionNode.java:78-85` </location>
<code_context>
- this.outputVariables = requireNonNull(outputVariables, "outputVariables is null");
- this.sources = requireNonNull(sources, "sources is null");
- this.tableArgumentProperties = requireNonNull(tableArgumentProperties, "tableArgumentProperties is null");
+ this.arguments = ImmutableMap.copyOf(arguments);
+ this.outputVariables = ImmutableList.copyOf(outputVariables);
+ this.sources = ImmutableList.copyOf(sources);
+ this.tableArgumentProperties = ImmutableList.copyOf(tableArgumentProperties);
+ this.copartitioningLists = copartitioningLists.stream()
+ .map(ImmutableList::copyOf)
+ .collect(toImmutableList());
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Immutable collections are used for constructor arguments, but copartitioningLists is not null-checked.
Add a requireNonNull check for copartitioningLists to prevent possible NullPointerExceptions and maintain consistency with other constructor arguments.
```suggestion
this.arguments = ImmutableMap.copyOf(arguments);
this.outputVariables = ImmutableList.copyOf(outputVariables);
this.sources = ImmutableList.copyOf(sources);
this.tableArgumentProperties = ImmutableList.copyOf(tableArgumentProperties);
this.copartitioningLists = requireNonNull(copartitioningLists, "copartitioningLists is null").stream()
.map(ImmutableList::copyOf)
.collect(toImmutableList());
this.handle = requireNonNull(handle, "handle is null");
```
</issue_to_address>
### Comment 2
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TableFunctionNode.java:116` </location>
<code_context>
+ return variables.build();
+ }
+
+ public List<VariableReferenceExpression> getProperOutputs()
{
return outputVariables;
</code_context>
<issue_to_address>
**suggestion:** Method name getProperOutputs is ambiguous.
Consider renaming to getOutputVariablesOnly or a more descriptive name, since getOutputVariables is overridden with different logic.
Suggested implementation:
```java
public List<VariableReferenceExpression> getOutputVariablesOnly()
{
return outputVariables;
}
```
If there are any calls to `getProperOutputs()` elsewhere in this file or in other files, they should be updated to `getOutputVariablesOnly()` to match the new method name.
</issue_to_address>
### Comment 3
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TableFunctionNode.java:265` </location>
<code_context>
+
+ public static class PassThroughColumn
+ {
+ private final VariableReferenceExpression outputVariables;
+ private final boolean isPartitioningColumn;
+
</code_context>
<issue_to_address>
**nitpick:** Field name outputVariables in PassThroughColumn is misleading.
Since outputVariables is a single VariableReferenceExpression, renaming it to outputVariable would improve clarity and maintain naming consistency.
</issue_to_address>
### Comment 4
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TableFunctionNode.java:278` </location>
<code_context>
+ }
+
+ @JsonProperty
+ public VariableReferenceExpression getOutputVariables()
+ {
+ return outputVariables;
</code_context>
<issue_to_address>
**nitpick:** Method getOutputVariables in PassThroughColumn should be getOutputVariable.
Rename the getter to getOutputVariable for consistency with the field name and to distinguish it from the plural method in the parent class.
</issue_to_address>
### Comment 5
<location> `presto-main-base/src/test/java/com/facebook/presto/connector/tvf/TestingTableFunctions.java:539-77` </location>
<code_context>
+ public static class DifferentArgumentTypesFunction
</code_context>
<issue_to_address>
**suggestion (testing):** Missing negative/edge case tests for DifferentArgumentTypesFunction.
Please add tests for error conditions and edge cases, such as missing or invalid arguments, to improve validation coverage for this function.
Suggested implementation:
```java
public static class DifferentArgumentTypesFunction
extends AbstractConnectorTableFunction
{
public static final String FUNCTION_NAME = "different_arguments_function";
public DifferentArgumentTypesFunction()
{
super(
SCHEMA_NAME,
FUNCTION_NAME,
ImmutableList.of(
TableArgumentSpecification.builder()
// Edge/negative case tests for DifferentArgumentTypesFunction
@Test(expected = IllegalArgumentException.class)
public void testMissingRequiredArgument()
{
// Attempt to call the function with missing required argument
// This should throw an IllegalArgumentException or similar
DifferentArgumentTypesFunction function = new DifferentArgumentTypesFunction();
// Simulate missing argument (actual invocation may differ based on API)
function.apply(/* missing required argument */);
}
@Test(expected = ClassCastException.class)
public void testInvalidArgumentType()
{
// Attempt to call the function with an argument of the wrong type
DifferentArgumentTypesFunction function = new DifferentArgumentTypesFunction();
// Simulate invalid type (actual invocation may differ based on API)
function.apply("invalid_type_instead_of_expected");
}
@Test
public void testBoundaryValues()
{
DifferentArgumentTypesFunction function = new DifferentArgumentTypesFunction();
// Test with boundary values, e.g., empty table, nulls, etc.
try {
function.apply(/* boundary value, e.g., empty table or null */);
} catch (Exception e) {
// Assert that the function handles boundary values gracefully
assertTrue(e instanceof IllegalArgumentException || e instanceof NullPointerException);
}
}
```
You may need to adjust the `function.apply(...)` calls to match the actual method signature and argument requirements of `DifferentArgumentTypesFunction`. Also, ensure that the test annotations (`@Test`) are compatible with your test framework (e.g., JUnit).
</issue_to_address>
### Comment 6
<location> `presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestAnalyzer.java:1971` </location>
<code_context>
- analyze("SELECT * FROM TABLE(system.two_arguments_function(text => 'foo', number => 1))");
+ analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function('foo'))");
+ analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => 'foo'))");
+ analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function('foo', 1))");
+ analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => 'foo', number => 1))");
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding tests for edge cases with null and default arguments for two_scalar_arguments_function.
Please add tests for cases with default values, single arguments, and combinations of null and default values to fully cover input scenarios for two_scalar_arguments_function.
Suggested implementation:
```java
analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function('foo'))");
analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => 'foo'))");
// Edge cases: null and default arguments
analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(NULL, 1))");
analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function('foo', NULL))");
analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => NULL, number => 1))");
analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => 'foo', number => NULL))");
// If the function supports default values, test with omitted arguments
analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function())"); // both default
analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => DEFAULT, number => DEFAULT))");
analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(DEFAULT, DEFAULT))");
// Combinations of null and default
analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(NULL, DEFAULT))");
analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(DEFAULT, NULL))");
// Single argument cases (already present, but ensure both positional and named)
analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function('foo'))");
analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => 'foo'))");
analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function('foo', 1))");
analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => 'foo', number => 1))");
// Invalid cases: too many arguments, all null, etc.
assertFails(TABLE_FUNCTION_INVALID_ARGUMENTS, "line 1:58: Too many arguments. Expected at most 2 arguments, got 3 arguments", "SELECT * FROM TABLE(system.two_scalar_arguments_function(1, 2, 3))");
assertFails(TABLE_FUNCTION_INVALID_ARGUMENTS, "line 1:58: All arguments are null", "SELECT * FROM TABLE(system.two_scalar_arguments_function(NULL, NULL))");
```
- If `DEFAULT` is not a valid keyword in your SQL dialect, replace it with the appropriate syntax for default arguments.
- If the function does not support omitted arguments or defaults, remove those tests or adjust them to expect failures.
- Ensure that the error messages in `assertFails` match the actual error messages produced by your analyzer for null/default misuse.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TableFunctionNode.java
Show resolved
Hide resolved
presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TableFunctionNode.java
Outdated
Show resolved
Hide resolved
presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TableFunctionNode.java
Outdated
Show resolved
Hide resolved
|
@mohsaka, @xin-zhang2 : This code looks alright. Lets fix the sourcery-ai review nitpicks. @jaystarshot : PTAL as well. |
21652b4 to
f1da363
Compare
Co-authored-by: kasiafi <[email protected]> Co-authored-by: Xin Zhang <[email protected]>
f1da363 to
8df4d16
Compare
|
@sourcery-ai review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/testing/LocalQueryRunner.java:775` </location>
<code_context>
@Override
public void createCatalog(String catalogName, String connectorName, Map<String, String> properties)
{
- throw new UnsupportedOperationException();
+ nodeManager.addCurrentNodeConnector(new ConnectorId(catalogName));
+ connectorManager.createConnection(catalogName, connectorName, properties);
}
</code_context>
<issue_to_address>
**issue (bug_risk):** createCatalog now performs connector registration.
Please verify that error handling and rollback are properly implemented to prevent inconsistent state if connectorManager.createConnection fails.
</issue_to_address>
### Comment 2
<location> `presto-main-base/src/test/java/com/facebook/presto/connector/tvf/TestingTableFunctions.java:539-77` </location>
<code_context>
+ public static class DifferentArgumentTypesFunction
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding negative and edge case tests for DifferentArgumentTypesFunction.
Please include tests for invalid argument scenarios, such as missing required arguments, incorrect types, duplicate names, and unexpected order, to validate the function's error handling.
Suggested implementation:
```java
public static class DifferentArgumentTypesFunction
extends AbstractConnectorTableFunction
{
public static final String FUNCTION_NAME = "different_arguments_function";
public DifferentArgumentTypesFunction()
{
super(
SCHEMA_NAME,
FUNCTION_NAME,
ImmutableList.of(
TableArgumentSpecification.builder()
}
// Negative and edge case tests for DifferentArgumentTypesFunction
@Test
public void testDifferentArgumentTypesFunctionNegativeCases()
{
// Missing required argument
try {
// Simulate call with missing required argument
callDifferentArgumentsFunction(/* missing required argument */);
fail("Expected exception for missing required argument");
}
catch (IllegalArgumentException | PrestoException e) {
// Expected
}
// Incorrect argument type
try {
// Simulate call with incorrect argument type
callDifferentArgumentsFunction("string_instead_of_table");
fail("Expected exception for incorrect argument type");
}
catch (IllegalArgumentException | PrestoException e) {
// Expected
}
// Duplicate argument names
try {
// Simulate call with duplicate argument names
callDifferentArgumentsFunctionWithDuplicateNames();
fail("Expected exception for duplicate argument names");
}
catch (IllegalArgumentException | PrestoException e) {
// Expected
}
// Unexpected argument order
try {
// Simulate call with unexpected argument order
callDifferentArgumentsFunctionWithWrongOrder();
fail("Expected exception for unexpected argument order");
}
catch (IllegalArgumentException | PrestoException e) {
// Expected
}
}
// Helper methods to simulate function calls (implementations depend on your test framework)
private void callDifferentArgumentsFunction(Object... args)
{
// Simulate invocation of DifferentArgumentTypesFunction with given args
// This should trigger validation logic in the function
}
private void callDifferentArgumentsFunctionWithDuplicateNames()
{
// Simulate invocation with duplicate argument names
}
private void callDifferentArgumentsFunctionWithWrongOrder()
{
// Simulate invocation with arguments in unexpected order
}
```
You will need to implement the helper methods (`callDifferentArgumentsFunction`, `callDifferentArgumentsFunctionWithDuplicateNames`, and `callDifferentArgumentsFunctionWithWrongOrder`) according to your test framework and how you invoke table functions in your tests. Make sure these methods actually trigger the validation logic in `DifferentArgumentTypesFunction`.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
presto-main-base/src/main/java/com/facebook/presto/testing/LocalQueryRunner.java
Show resolved
Hide resolved
|
@aditi-pandit Addressed all sourcery-ai comments. |
aditi-pandit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mohsaka and @xin-zhang2
@jaystarshot : PTAL.
|
@xin-zhang2 : Can you rebase the code and push again ? I have seen spark on e2e tests error fixed in new Velox builds. |
|
Can you merge this into #26445 or clarify exactly what is split? Those 2 have same changes for eg presto-analyzer/src/main/java/com/facebook/presto/sql/analyzer/Field.java
|

Description
Smaller changes from #26445 to make the PR more readable. Mainly cleanup/refactoring and some small additions.
Motivation and Context
To make the larger PR simpler. These changes can be standalone.
Impact
Test Plan
Ran updated TestAnalyzer to make sure analysis changes didn't break anything.
Ran TestTableFunctionInvocation to make sure push down table functions aren't broken.
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.